NEOCR: A Configurable Dataset for Natural Image Text Recognition
نویسندگان
چکیده
Recently growing attention has been paid to recognizing text in natural images. Natural image text OCR is far more complex than OCR in scanned documents. Text in real world environments appears in arbitrary colors, font sizes and font types, often affected by perspective distortion, lighting effects, textures or occlusion. Currently there are no datasets publicly available which cover all aspects of natural image OCR. We propose a comprehensive well-annotated configurable dataset for optical character recognition in natural images for the evaluation and comparison of approaches tackling with natural image text OCR. Based on the rich annotations of the proposed NEOCR dataset new and more precise evaluations are now possible, which give more detailed information on where improvements are most required in natural image text OCR.
منابع مشابه
Natural scene text localization using edge color signature
Localizing text regions in images taken from natural scenes is one of the challenging problems dueto variations in font, size, color and orientation of text. In this paper, we introduce a new concept socalled Edge Color Signature for localizing text regions in an image. This method is able to localizeboth Farsi and English texts. In the proposed method rst a pyramid using diff...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملLocalization and Recognition of Text with Perspective Distortion in Natural Scenes
Recognizing text in natural scene images refers to the problem of identifying words that present on it. Scene text recognition is very difficult due to some reasons such as, images contain very little amount of linguistic context, interpreting versions of letters and digits are required for scene text recognition and also scene text can appear in any orientation. Most of the existing works are ...
متن کاملText Recognition in Natural Images using Multiclass Hough Forests
Text detection and recognition in natural images are popular yet unsolved problems in computer vision. In this paper, we propose a technique that attempts to detect and recognize text in a unified manner by searching for words directly without reducing the image into text regions or individual characters. We present three contributions. First, we modify an object detection framework called Houg...
متن کاملText Localization and Character Extraction in Natural Scene Images using Contourlet Transform and SVM Classifier
The objective of this study is to propose a new method for text region localization and character extraction in natural scene images with complex background. In this paper, a hybrid methodology is suggested which extracts multilingual text from natural scene image with cluttered backgrounds. The proposed approach involves four steps. First, potential text regions in an image are extracted based...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011